Creating a Large-Scale Arabic to French Statistical MachineTranslation System

نویسندگان

  • Sasa Hasan
  • Anas El Isbihani
  • Hermann Ney
چکیده

In this work, the creation of a large-scale Arabic to French statistical machine translation system is presented. We introduce all necessary steps from corpus aquisition, preprocessing the data to training and optimizing the system and eventual evaluation. Since no corpora existed previously, we collected large amounts of data from the web. Arabic word segmentation was crucial to reduce the overall number of unknown words. We describe the phrase-based SMT system used for training and generation of the translation hypotheses. Results on the second CESTA evaluation campaign are reported. The setting was in the medical domain. The prototype reaches a favorable BLEU score of 40.8%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-Genre SMT System for Arabic to French

This work presents improvements of a large-scale Arabic to French statistical machine translation system over a period of three years. The development includes better preprocessing, more training data, additional genre-specific tuning for different domains, namely newswire text and broadcast news transcripts, and improved domain-dependent language models. Starting with an early prototype in 200...

متن کامل

Translation Model Adaptation for an Arabic/French News Translation System by Lightly-Supervised Training

Most of the existing, easily available parallel texts to train a statistical machine translation system are from international organizations that use a particular jargon. In this paper, we consider the automatic adaptation of such a translation model to the news domain. The initial system was trained on more than 200M words of UN bitexts. We then explore large amounts of in-domain monolingual t...

متن کامل

The need to create a media block for the convergence of overseas news networks

As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...

متن کامل

Large and Diverse Language Models for Statistical Machine Translation

This paper presents methods to combine large language models trained from diverse text sources and applies them to a state-ofart French–English and Arabic–English machine translation system. We show gains of over 2 BLEU points over a strong baseline by using continuous space language models in re-ranking.

متن کامل

A Human-Aided Machine Translation System for Japanese-English Patent Translation

The approach presented here enables Japanese users with no knowledge of English or legal English to generate patent claims in English from a Japanese-only interface. It exploits the highly determined structure of patent claims and merges Natural Language Generation (NLG) and Machine Translation (MT) techniques and resources as realized in the AutoPat and PC-Transfer applications. Due to its tun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006